Transportation Deserts and Transportation Accessibility: Sam Ding and Vichy Meas Ethnic Enclaves and City Demographics: Juthi Dewan and Freddy Barragan
Given the size of our group, we decided to split our capstone project into two related analyses where each sub-group has more control over the development of the project while maintaing our meaningful collaboration. Broadly, we are interested in characterizing New York City’s (NYC) internal racial segregation using demography, geographic mobility, community health, and economic outcomes. First, we want to identify transportation inaccess across neighborhoods in NYC. Then, we consider what neighborhoods racial enclaves in NYC are forming in and what they imply about racial segregation in NYC. Below we’ve included specific outlines of each subanalysis.
Specific Aim: Classify transportation deserts using ridership, demographics, and socioeconomic status.
Outcome: Transportation Desertification (Low, Medium, High). This will be calculated via ridership measures per neighborhood.
Grouping Variables: Neighborhood & Zip Code
Covariates:
- Number of Bus Stops (Transportation Access)
- Number of Jobs Accessible by Transit (Transportation Access)
- Percent on Public Assistance (demographic)
- Median Income (socioeconomic)
Limitations:
- Temporally correlated
- Imperfect nesting of zip code in neighborhood
- The existence of transportation hubs that allow large groups of people to cross erratically.
Specific Aim: How are the densities of nonwhite communities distributed by neighborhood associated with transportation desertification, when accounting for differences in socioeconomic status, community health, and demographics by neighborhood?
Outcome: Proportion of the community that is not white. This is calculated by subtracting the proportion of the white population from 1.
Predictor of Interest: Transportation Desert Status (Low, Medium, High)
Grouping Variables: Neighborhood & Zip Code
Covariates:
- Socioeconomic Status (by Neighborhood):
- Median Asking Price of Craigslist/Street Easy Rentals
- Percent on Public Assistance
- Median Income
- Number of Evictions
- Community Health (by Neighborhood):
- Number of Parks
- Number of Schools
- Number of Evictions
- Number of Food Retailers
- Transportation Access (by Neighborhood):
- Number of Jobs Accessible by Transit
- Demographics (by Neighborhood):
- Percent Female
- Percent Non-citizen
Limitations:
- Rental Prices are temporally correlated and vary by month
- Racial/Ethnic demographic data only measures predominant racial groups (e.g. White, Black, Latinx, Native, Asian, and Pacific Islander).
- Imperfect nesting of zip code in neighborhood
- Differences in years of observation (e.g. Outcome data is from 2021, but other necessary data may be from 2018)
NOTE: We intend to first study nonwhite population density, then reproduce in stratified analyses for each racial group (e.g. White, Black, Latinx, Native, Asian, and Pacific Islander) to see what may affect community density differently by group.
We’re attempting to extend Raven McKnight’s capstone on transit ridership in the Twin Cities (McKnight, 2020), reports on neighborhood-level health disparities (Gracia et al., 2014; Ji et al, 2020), and previous work on gentrification, transportation deserts, urban displacement in NYC (Chapple et al.) to study the socioeconomic relationships between race, income, transportation, and class-based disparities in NYC under a Bayesian framework. Specifically, we intend to fit spatial hierarchical models to identify gaps in transportation access and how they may be associated with racial density and segregation of NYC. Our question is particularly suited for a Bayesian analysis, given the relative computational efficiency of Bayesian hierarchical models when studying data with nested structures as we have in NYC’s neighborhoods. Further, a Bayesian framework allows us posterior information across zip-codes, even with relatively sparse, missing, or asymmetrical data that is certainly present by neighborhood. With respect to the workflow of our data analysis, we intend to select and test multiple prior distributions for our models. Prior selection will be first done using flat-priors, then specified using previous models of these relationships or previous reports on NYC’s demographics.
NYC Open Data is our largest source of data, given that it has detailed datasets from all of NYC’s divisions at the neighborhood level. However, turnstile data and rental data was generated from the MTA and Craigslist/StreetEasy, respectively.
Below, we use our cleaned data to visualize our ethnic enclave predictors by neighborhood. Note, rental price data has taken substantially longer to join, so it is excluded here.
# Load packages
library(tidyverse)
library(janitor)
library(here)
# themes
theme_set(theme_minimal())
grocery <- read_csv(here("ethnic","Data","grocery.csv")) # coordinate
schools <- read_csv(here("ethnic","Data","schools.csv")) # censuse
green_space <- read_csv(here("ethnic","Data","parks.csv")) # coordinate
evictions <- read_csv(here("ethnic","Data","evictions.csv")) # census
access_income <- read_csv(here("ethnic","Data","transit_income.csv"))
demographics <- read_csv(here("ethnic","Data","demographics.csv"))
rental_price <- read_csv(here("ethnic","Data","median_rent.csv"))
grocery <- grocery %>%
dplyr::select(`License Number`, County, `Zip Code`) %>%
rename_all(tolower) %>%
clean_names() %>%
group_by(zip_code) %>%
tally()%>%
dplyr::rename(zip = zip_code) %>%
dplyr::rename("grocery_stores" = "n")%>%
mutate(zip = as.factor(zip))
schools <- schools %>%
dplyr::select(`Location 1`, CENSUS_TRACT, NTA) %>%
rename_all(tolower) %>%
clean_names() %>%
mutate(postcode = sub(".*NY ", "", location_1), .before=1) %>%
mutate(postcode = sub("(\n).*", "", postcode), .before=1)%>%
mutate(new_coord = str_extract_all(location_1, "\\([^()]+\\)"), .before=2) %>%
mutate(new_coord = substring(new_coord, 2, nchar(new_coord)-1), .before=2) %>%
separate(new_coord, into = c("latitude", "longitude"), sep = "[,]") %>%
mutate(latitude=as.numeric(latitude),
longitude = as.numeric(longitude)) %>%
dplyr::select(-c(location_1))%>%
dplyr::rename(zip = postcode)%>%
mutate(zip = as.factor(zip)) %>%
group_by(zip) %>%
tally()%>%
dplyr::rename(schools = n)
green_space <- green_space %>%
dplyr::select(GlobalID, ZIPCODE) %>%
rename_all(tolower) %>%
clean_names() %>%
group_by(zipcode) %>%
tally()%>%
dplyr::rename(zip = zipcode)%>%
dplyr::rename(parks = n)%>%
mutate(zip = as.factor(zip))
evictions<- evictions %>%
dplyr::select(BOROUGH, `Eviction Postcode`, `Census Tract`, NTA) %>%
rename_all(tolower) %>%
clean_names() %>%
group_by(eviction_postcode) %>%
tally() %>%
dplyr::rename(zip = eviction_postcode)%>%
dplyr::rename(evictions = n)%>%
mutate(zip = as.factor(zip))
access_income<- access_income %>%
rename_all(tolower) %>%
clean_names() %>%
mutate(median_income = str_remove(median_income, "[$]")) %>%
mutate(median_income = str_remove(median_income, "[,]")) %>%
mutate(median_income = as.double(median_income)) %>%
mutate(zip = as.factor(zip))
demographics <- demographics %>%
dplyr::select(`JURISDICTION NAME`, `PERCENT FEMALE`, `PERCENT PACIFIC ISLANDER`, `PERCENT HISPANIC LATINO`, `PERCENT AMERICAN INDIAN`, `PERCENT WHITE NON HISPANIC`, `PERCENT ASIAN NON HISPANIC`, `PERCENT BLACK NON HISPANIC`, `PERCENT OTHER ETHNICITY`, `PERCENT ETHNICITY UNKNOWN`, `PERCENT US CITIZEN`, `PERCENT RECEIVES PUBLIC ASSISTANCE`, `PERCENT NRECEIVES PUBLIC ASSISTANCE`) %>%
rename_all(tolower) %>%
clean_names() %>%
mutate(percent_nonwhite = 1-percent_white_non_hispanic, .before=2)%>%
dplyr::rename(zip = jurisdiction_name)%>%
mutate(zip = as.factor(zip))
rental_price <- rental_price %>%
rename_all(tolower) %>%
clean_names()
names(rental_price) <- names(rental_price)%>%
stringr::str_replace_all("x","")
rental_price_clean <- rental_price %>%
filter(areatype=="neighborhood") %>%
dplyr::select(areaname, starts_with("2021")) %>%
na.omit() %>%
rowwise(areaname) %>%
filter(!grepl("All ",areaname)) %>%
mutate(yearly_median = rowMeans(na.rm = TRUE, across(where(is.numeric)))) %>%
dplyr::select(areaname, yearly_median)
cleaned_nyc <- access_income %>%
filter(neighborhood != "AVERAGE NYC NEIGHBORHOOD") %>%
left_join(., demographics, by="zip") %>%
left_join(., evictions, by="zip") %>%
left_join(., schools, by="zip") %>%
left_join(., green_space, by="zip") %>%
left_join(., grocery, by="zip") %>%
mutate(grocery_stores = ifelse(is.na(grocery_stores), 0, grocery_stores)) %>%
filter(!is.na(percent_nonwhite)) %>%
mutate(modzcta = zip, .before=1)
# cleaned_nyc %>%
# dplyr::select(neighborhood, ZIPCODE) %>%
# fuzzyjoin::stringdist_inner_join(., rental_price_clean, by=c("neighborhood"="areaname"))
library(sf)
# read shapefiles of NYC Zip Codes
boundary_nyc <- st_read(here("ethnic", "Data", "modzcta_bounds","geo_export_27a93715-f9a7-451d-bf54-fe2faa45cb1e.shp"),
quiet = TRUE)
library(zipcodeR)
# read ridership
transit_points <- read_csv(here("transit","ridership_points.csv"))%>%
separate(Position, into=c("Point", "longitude", "latitude"), " ") %>%
mutate(latitude = str_remove_all(latitude, "[)]"),
longitude = str_remove_all(longitude, "[()]"),
) %>%
dplyr::select(-c(Point)) %>%
mutate(latitude = as.numeric(latitude),
longitude = as.numeric(longitude)) %>%
st_as_sf(coords = c("longitude", "latitude"), crs = st_crs(boundary_nyc), agr = "constant")
# zip_ridership <- c()
# for(i in transit_points$search){
# zip_ridership <- cbind(search_radius("40.813,-73.93,.1"), zip_ridership)
#
#
# cmd_table <- paste0()
#
# eval(parse(text = cmd_table), envir=.GlobalEnv)
#
#
# gene_by_gene_table <<- c(gene_by_gene_table, paste0("cox_", de_sym))
# }
#
# paste("")
#
#
#
# cmd_table <- paste0('_', de_sym, '<-',
# "coxph(Surv(survival_years, Vital.Status) ~ ",
# de_sym, "+ inferred_SIRE + Sex +
# age_risk + WBC_risk + CNS_risk +
# testes_risk + subtype_category_unknown,
#
# singular.ok = TRUE, data = survival_clinical_f) %>%
#
# tbl_regression(exponentiate = TRUE, pvalue_fun = ~style_pvalue(.x, digits = 2)) %>%
# as_tibble() %>%
# dplyr::slice(-c(2:nrow(.))) %>%
# magrittr::set_colnames(c('DE Gene', 'HR', '95% CI', 'p-value')) %>%
# mutate('95% CI' = paste0('(',`95% CI`,')'))")
#
#
# eval(parse(text = cmd_table), envir=.GlobalEnv)
#
#
# search_radius("40.813,-73.93,.1")
nyc_join <- merge(boundary_nyc, cleaned_nyc)
transit_ridership_neighborhood <- transit_points %>% mutate(
intersection = as.integer(st_intersects(geometry, nyc_join)),
modzcta = if_else(is.na(intersection), "", nyc_join$modzcta[intersection]), .before=1
) %>%
rownames_to_column(.) %>%
group_by(modzcta) %>%
dplyr::summarise(stops = n(),
median_ridership = mean(`2018Ridership`)) %>%
filter(modzcta != "") %>%
data.frame() %>%
dplyr::select(-c(geometry))
full_nyc <- merge(nyc_join, transit_ridership_neighborhood, all=TRUE)
# simple plot shows all locations
library(s2)
#plot locations over map
subway_loc <- ggplot() +
geom_sf(data = nyc_join, fill = "#EBF6FF", color = "#D48DD8", size = 0.15, alpha = .2) +
geom_sf(data = transit_points, color="#3F123C", size=2) +
coord_sf(datum = st_crs(boundary_nyc)) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Subway Stop Locations \nin NYC")+
theme(#panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
stops <- full_nyc %>%
ggplot() +
geom_sf(aes(fill = stops), color = "#8f98aa") +
scale_fill_gradient(low = "#EBF6FF", high = "#BC24B0",
guide = guide_legend(title = "Number of Stops") ,na.value="#D6D6D6") +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Subway Stop Counts \nin NYC")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
ridership <- full_nyc %>%
ggplot() +
geom_sf(aes(fill = median_ridership), color = "#8f98aa") +
scale_fill_gradient(low = "#EBF6FF", high = "#BC24B0",
guide = guide_legend(title = "Mean Ridership") ,na.value="#D6D6D6") +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Mean Subway Turnstile \nRidership in 2018 \nfor NYC")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
library(egg)
ggarrange(subway_loc, stops, ridership, ncol=3)
red <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_female), color = "#8f98aa") +
scale_fill_gradient(low = "#FCF5EE", high = "#E13728", guide = guide_legend(title = "Percent Female")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Female Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
yellow <- ggplot(full_nyc) +
geom_sf(aes(fill = median_income), color = "#8f98aa") +
scale_fill_gradient(low = "#FCF5EE", high = "#F3D24E", guide = guide_legend(title = "Median Income")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Median Income")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
teal <- ggplot(full_nyc) +
geom_sf(aes(fill = jobs_accessible), color = "#8f98aa") +
scale_fill_gradient(low = "#FCF5EE", high = "#2DBDC7", guide = guide_legend(title = "Number of Jobs")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Job Access via Public Transit")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
purple <- ggplot(full_nyc) +
geom_sf(aes(fill = evictions), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#7826C0", guide = guide_legend(title = "Number of Evictions")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Evictions")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
orange <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_receives_public_assistance), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#FC9228", guide = guide_legend(title = "Percent on \nPublic Assistance")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Public Assistance")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
green <- ggplot(full_nyc) +
geom_sf(aes(fill = grocery_stores), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#326902", guide = guide_legend(title = "Number of Stores")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Retail Food Stores")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
yellow_green <- ggplot(full_nyc) +
geom_sf(aes(fill = parks), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#939E28", guide = guide_legend(title = "Number of Parks")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Green Space")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
blue <- ggplot(full_nyc) +
geom_sf(aes(fill = schools), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#5372C4", guide = guide_legend(title = "Number of Schools")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Number of Schools")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
pink <- ggplot(full_nyc) +
geom_sf(aes(fill = pop_est), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#F450E1", guide = guide_legend(title = "Number of People")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
ggarrange(red, orange, yellow, yellow_green, green, teal, blue, purple, pink, ncol=3)
Next, we use the same dataset to look at how our population demographic outcomes vary by neighborhood.
white <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_white_non_hispanic), color = "#8f98aa") +
scale_fill_gradient(low = "#FCF5EE", high = "#7B435B", guide = guide_legend(title = "Percent White")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("White Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 15, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
black <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_black_non_hispanic), color = "#8f98aa") +
scale_fill_gradient(low = "#FCF5EE", high = "#F25F5C", guide = guide_legend(title = "Percent Black")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Black Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 15, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
asian <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_asian_non_hispanic), color = "#8f98aa") +
scale_fill_gradient(low = "#FCF5EE", high = "#717EC3", guide = guide_legend(title = "Percent Asian")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Asian Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 15, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
latinx <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_hispanic_latino), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#FC9A38", guide = guide_legend(title = "Percent Latinx")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Latinx Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 15, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
native <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_american_indian), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#5DA271", guide = guide_legend(title = "Percent Native")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("American Native Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 15, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
pacific <- ggplot(full_nyc) +
geom_sf(aes(fill = percent_pacific_islander), color = "#8f98aa")+
scale_fill_gradient(low = "#FCF5EE", high = "#16BAC5", guide = guide_legend(title = "Percent Pacific Islander")) +
theme_minimal() +
theme(panel.grid.major = element_line("transparent"),
axis.text = element_blank()) +
ggtitle("Pacific Islander Population")+
theme(panel.grid.major = element_line("transparent"),
plot.title = element_text(size = 15, face = "bold"),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)) +
guides(shape = guide_legend(override.aes = list(size = 4)),
color = guide_legend(override.aes = list(size = 4)))
ggarrange(white, black, latinx, asian, native, pacific, ncol=2)
Challenge: What do you anticipate to be the difficulty level of your current project? What new tools/skills will you need to pick up for this project?
The largest conceptual hurdle will be learning how to use hierarchical models and understanding when they’re appropriate. However, a major burden of this project will largely be the cleaning and joining process of our different disparate datasets.
Creativity: In what ways will your project be unique? Can you imagine creative visualizations/presentations/modes of communication/modes of distribution that will enhance your project?
Previous measurements of NYC racial segregation and neighborhood-specific racial composition have occurred, but almost universally did they happen under a frequentist mode of analysis. Our project is thus atypical within urban studies on NYC. However, the atypicalities extend even into our project format. Because our major goal is to communicate these justice issues to a broader public, we want to create a zine or pamphlet that explains the racial composition of NYC and study how structural issues are associated with demographics.
Teamwork: Agree upon and identify the following.
We will meet for an hour on MWF afternoons. With respect to document sharing, we are currently sharing documents and data via a shared google drive, but intend to use a github repository to share code between us, so that we can stay up to date.
When presenting our work, our intended audience is students who have taken 155, largely because many non-statisticians and/or non-mathematicians attend Capstone days, but most have not taken 454. When writing our work, our intended audience is for students who may have encountered statistics or geography, but do not neccesarily know extensive background information. Again, because our group is fairly large, we are hoping to use two formats to present our work and sub-work: